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Abstract 

We merge in this note our two discussions about the Read Paper "Particle Markov 
chain Monte Carlo" (Andrieu, Doucet, and Holenstein, 2010) presented on October 16 th 
2009 at the Royal Statistical Society, appearing in the Journal of the Royal Statistical 
Society Series B. We also present a more detailed version of the ABC extension. 

1 Introduction 

The article Andrieu et al. (2010) is clearly going to have significant impact on scientific 
disciplines with a strong interface with computational statistics and non-linear state space 
models. Our comments are based on practical experience with PMCMC implementation 
in latent process multifactor SDE models for commodities (Peters et al., 2009), wireless 
communications (Nevat et al., 2009) and population dynamics (Hayes et al., 2009), using 
Rao-Blackwellised particle niters (Doucet et al., 2000) and adaptive MCMC (Roberts and 
Rosenthal, 2009). 

2 Generic comments 

• From our implementations, ideal use cases consist of highly non-linear dynamic equa- 
tions for a small dimension d x of the state-space, large dimension do of the static 
parameter, and potentially large length T of the time series. In our cases d x was 2 or 
3, de up to 20, and T between 100 and 400. 
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• In PMH, non-adaptive MCMC proposals for 9 (e.g. tuned according to pre-simulation 
chains or burn-in iterations) would be costly for large T, and requires to keep N fixed 
over the whole run of the Markov chain. Adaptive MCMC proposals such as the 
Adaptive Metropolis sampler (Roberts and Rosenthal, 2009), avoid such issues and 
proved particularly relevant for large dg and T, as can be seen in Figure 2. 

• The Particle Gibbs (PG) could potentially stay frozen on a state x\./r(i). Consider a 
state space model with state transition function almost linear in x n for some range 
of 9, from which yi-T is considered to result, and strongly non-linear elsewhere. If 
the PG samples 9(i) in those regions of strong non-linearity, the particle tree would 
likely coalesce on the trajectory preserved by the conditional SMC, leaving it with a 
high importance weight, maintaining (9(i + 1), Xi-t{i + 1)) = (9(i),Xi-T{i)) over several 
iterations. Using PMH within PG would help escape this region, especially using PRC 
and adaptive SMC kernels, outlined in another comment, to fight the degeneracy of the 
filter and the high variance of peiyi-.x)- 



3 Adaptive Sequential Monte Carlo 



Our comments on adaptive SMC relate to Particle marginal Metropolis-Hastings (PMMH) 
which has acceptance probability given in Equation (13) of the read paper for proposed state 
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suffices to approximate the mode of joint path space distribution 
proposal for Xi : t, it results in high variance estimates of pg* (ui-.t) 
dynamics example from (Hayes et al., 2009, Model 3 excerpt), involving a log-transformed 
theta-logistic state space model, see (Wang, 2007, Equation 3(a), 3(b)) for parameter settings. 
PMCMC performance depends on the trade-off between degeneracy of the filter, N, and 
design of the SMC mutation kernel. Regarding the latter: 



A Rao-Blackwellised filter (Doucet et al. 
et al. (see 2009). 



2000) can improve acceptance rates, Nevat 



Adaptive mutation kernels, which in PMCMC, can be considered as adaptive SMC 
proposals, can reduce degeneracy on the path space, allowing for higher dimensional 
state vectors x n . Adaption can be local (within filter) or global (sampled Markov 
chain history). Though currently particularly designed for ABC methods, the work of 
Peters et al. (2008) incorporates into the mutation kernel of SMC Samplers (Del Moral 
et al., 2006) the Partial Rejection Control (PRC) mechanism of Liu (2001), which is 
also beneficial for PMCMC. PRC adaption reduces degeneracy by rejecting a particle 
mutation when its incremental importance weight is below a threshold c„. The PRC 
mutation kernel 
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can also be used in PMH, where q$(x n \y n , x n -i) is the standard SMC proposal, and 
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As presented in Peters et al. (2008), algorithmic choices for qg(x n \y n , x n -i) can avoid 
evaluation of r(c n ,x n _i). Cornebise (2009b) extend this work, developing PRC for 
Auxiliary SMC samplers, also useful in PMH. Threshold c n can be set adaptively: 
locally either at each SMC mutation or Markov chain iteration; or globally based on 
chain acceptance rates. Additionally, c n can be set adaptively via quantile estimates of 
pre-PRC incremental weights, see Peters et al. (2009). 

Cornebise et al. (2008) state that adaptive SMC proposals can be designed by mini- 
mizing function-free risk theoretic criteria such as Kullback-Leibler divergence between 
a joint proposal in a parametric family and a joint target. Cornebise (2009a, Chapter 
5) and Cornebise et al. (2009) use a mixture of experts, adapting kernels of a mixture 
on distinct regions of the state-space separated by a softmax partition. These results 
extend to PMCMC settings. 




Population dynamics state space model with log transformed theta-logistic latent process. 



-Generated latent state realisations 
- Generated observations 



Figure 1: Sequence of simulated states and observations for the population dy- 
namic log-transformed theta logistic model from Wang (2007), with static parame- 
ter 9 = (r, (, K) under constraints K > 0, r < 2.69, ( G R. State transi- 
tion is fo(x n \x n -i) = Af (x n ; x n _i + r (l — (exp(x t _i)/K) < ') , 0.01), and local likelihood is 
9e{yn\x n ) =N(y n ; x n , 0.04), for T = 100 timesteps. 



3 



Sample path for latent state value x in the PMH via an SIR filter. 



i ' i i i i i i i i 



Markov chain iteration i 



Sample path for latent state value x in the PMH via an SIR filter. 



Markov chain iteration i 



Sample path for latent state value x in the PMH via an SIR filter. 
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Markov chain iteration i 



Sample path for parameter r in the PMH via an adaptive Metropolis proposal. 




Markov chain iteration i 

Sample path for parameter K in the PMH via an adaptive Metropolis proposal. 
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Markov chain iteration i x 10 1 

Sample path for parameter £ in the PMH via an adaptive Metropolis proposal. 
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Markov chain iteration i 



Figure 2: Path of three sampled latent states X2, X37, X93, and of the sampled parameters 
9 = (r,(,L), over 100,000 PMH iterations based on N = 200 particles using a simple SIR 
filter - the one dimensional state did not call for Rao-Blackwellisation. Note also the effect 
the Adaptive MCMC proposal for 9, set-up to start at iteration 5, 000, particularly visible on 
the mixing of parameter K. The most noticeable property of the algorithm is the remarkable 
mixing of the chain, in spite of the high total dimension of the sampled state: each iteration 
involves a proposal of (X 1:T ,9) of dimension 103. 
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Initialised PMH state for x 
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Average poster mean path estimate for X^, PMH Markov chain iteration i=10, N=200 particles 
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Average poster mean path estimate for X 1-T , PMH Markov chain iteration i=20,000, N=200 particles 
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Figure 3: Convergence of the distribution of the path of latent states xi-t- Note the change 
of vertical scale. Initializing PMH on a very unlikely initial path does not prevent the MMSE 
estimate of the latent states converging: as few as 10 PMH iterations already begins to 
concentrate the sampled paths around the true path - assumed here to be close to the mode 
of the posterior distribution thanks to the small observation noise with very satisfactory 
results after 20, 000 iterations. 
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4 Approximate Bayesian Computation and PMCMC 



For intractable joint likelihood PeiVi-.T^i-.T)-, we could design a SMC-ABC algorithm (see 
e.g. Peters et al., 2008; Ratmann, 2010, Chapter 1) for a fixed ABC tolerance e, using the 
approximations 
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with p a distance on the observation space and y*(s) ~ ge{-\x^) simulated observations. Ad- 
ditional degeneracy on the path space induced by ABC approximation should be controlled, 
e.g. with PRC (Peters et al., 2008), see Equation (1). More details on this algorithm are 
available in Appendix A, which is not contained in our comment to JRSSB due to space 
restrictions. 



A Algorithmic details of ABC filtering within PMCMC 

Here we expand on the comment we made above in which we approximated the local like- 
lihood go{y n \x n ) of the SMC-based filtering part of the PMCMC algorithm, by the ABC 
approximation 

1 S 

g£ BC (yn\x n ,yn(tS),e) ■= -^2n e (y n (s)\x n ,y n ,e) (3) 

s=l 

where possible choices for ttq are 

4 (y n (s)\x n ,y n ,e) :=I(p(y n (s),y n ) < e) or itf (y n (s)\x n ,y n , e) := M (y n (s); y n , e 2 ) 

with p a distance on the observation space and y n (s) ~ gg(-\x n ) simulated observations - 
assumed here to be univariate for sake of brevity, but generalisation to multivariate setting 
and summary statistics is straightforward. 

We note it is critical in the filtering context to ensure that the particle system under 
approximation does not collapse into uniformly null incremental weights w n (Xf. n ) = which 
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Algorithm 1 SMC-ABC-PRC filtering algorithm targeting Po(%uT\yi:T) as required in Step 
2(b) of the PMMH of (Andrieu et al., 2010, Section 2.4.2). Replaces the SMC algorithm pre- 
sented in (Andrieu et al., 2010, Section 2.2.1). Approximation g$ BC is defined in Equation (3) 
and function r in Equation (2). 

Step 1: Initialize e and c\ 
Step 2: At time n = 1, 

(a) for k = 1, . . . , N 

(i) sample X\ ~ qe(-\yi) 

(ii) sample Yf(s) ~ g e (-\X k ) for s = 1, . . . , S 

(iii) compute the incremental weight 

~ ( yk , fJ»(XZ)g™C( yi \x><,Y 1 k (l:S),e) 

w n A J •- T^fci — \ > 

Qe{Xf\yi) 

(iv) with probability 1 — p\ = 1 — min{l, wi(X k )/ci}, reject X k and go to (i) 

(v) otherwise, accept X k and set 

w 1 (X k )=w 1 (X k )r(c 1 )/p k 

(b) normalise the weights W k := w 1 (X k )/ J2m=i w i{X™). 
Step 3: At times n = 2, . . . , T, 

(a) possibly adapt c n online 

(b) for k = 1, . . . , N 

(i) sample A k n _ x ~ ^(.|W^i), 

(ii) sample X k n ~ q e (-\y n , X n ^) and set X\. n = (X n ^\X k ), and 

(iii) sample Y k (s) ~ g$(-\X%) for s = 1, . . . , S 

(iv) compute the incremental weight 



Wn(X k : 



fe{X k n \xth 1 )9i BC {yn\XlY*{V.S),e) 



q e (X^\y n ,X n ^) 



(v) with probability 1 — p k = 1 — min{l , w n (X k . n ) / c n } , reject X k and go to (ii) 

(vi) otherwise, accept X k and set 

w n (X k J = w^X'jric^xfr^/Pn 
(c) normalise the weights W k := w n (X^ n )/ J2m=i w n(X™ n ). 
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may occur for ic B at any stage of the filtering during each PMCMC iteration, especially for 
small tollerances e. The PRC mutation kernel qg(x n \y n , x n _i) defined in Equation (1) is 
critical to overcome both this collapse and the additional degeneracy on the path space 
introduced by the ABC approximation. The algorithm presented in McKinley et al. (2009, 
Sections 3.4 and 3.5.1) is a special case of the SMC samplers PRC-ABC algorithm of Peters 
et al. (2008) in which the PRC rejection threshold c n = 0, the mutation kernel is global 
and resampling is performed at each stage of the filter, which avoids the computation of 
the normalizing constant r(c n ,x n _i) defined in Equation (2). We further note that the 
work of Cornebise (2009a) casts the SMC sampler PRC algorithm (Peters et al., 2008) with 
rejection of the ancestor index A\_ x - here in Step 3.(a).(v) - into an Auxiliary SMC sampler 
framework. The combination of these two concepts recovers a generalized version of McKinley 
et al. (2009), see Algorithm 1, which has advantage in the PMCMC setting of allowing for 
adaptation of the threshold c n . 
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